{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Category型与离散化\n",
    "类别类型可谓是非常常用的一种类型，其具有如下特征：\n",
    "1. 取固定几种值；\n",
    "2. 可以定义序，序的形式与实数序或字典序可以都不同；\n",
    "3. 即使是数值表示，数值运算可能也无意义，与离散数值型不一定相同。\n",
    "\n",
    "__auther__ = 'zhenhang.sun@gmail.com'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'D:\\\\新建文件夹\\\\pandas-tutorial'"
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pwd"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# 1. 创建"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.1 创建Category的类\n",
    "#### `pd.Categorical(values, categories=None, ordered=False)`\n",
    "- values: 类别序列；\n",
    "- categories：自定义的类别序列；\n",
    "- ordered：类别是否定义顺序，默认增序。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[2, 1, 1, 3]\n",
       "Categories (3, int64): [1 < 2 < 3]"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c = pd.Categorical([2,1,1,3], ordered = True ) \n",
    "c\n",
    "# 不提供categories，则用values去重后的值作为类别\n",
    "# 若ordered =True，顺序则按照字典序升序给定"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c = pd.Categorical([1,2,3], categories = [3,2], ordered = True )\n",
    "c\n",
    "# 提供categories（类别不能有重复，否则报错），若values的值不在categories中，则用NaN替换\n",
    "# 若ordered =True，顺序则按照类别顺序升序给定"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 类别的两个重要属性 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Int64Index([3, 2], dtype='int64')"
      ]
     },
     "execution_count": 39,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c.categories  # 类别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 40,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c.ordered # 是否有序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1.2 转换为类别类型"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 41,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2\n",
       "1    1\n",
       "2    1\n",
       "3    3\n",
       "dtype: int64"
      ]
     },
     "execution_count": 41,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series([2,1,1,3])\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2\n",
       "1    1\n",
       "2    1\n",
       "3    3\n",
       "dtype: category\n",
       "Categories (3, int64): [1, 2, 3]"
      ]
     },
     "execution_count": 42,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = s.astype('category')  \n",
    "s   #可以看到dtype已经变成category型 "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Series查看类型属性需要通过`.cat`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Int64Index([1, 2, 3], dtype='int64')"
      ]
     },
     "execution_count": 43,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.cat.categories"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 44,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s.cat.ordered"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# 2. 查、改、增、删"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.1 查"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `[]` 四种查看方式\n",
    "类别类型是序列形式，可以采用`[]`来查看，不过`.loc[]`和`.iloc[]`都是不支持的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "nan"
      ]
     },
     "execution_count": 45,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c[0:2]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 47,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c[[0,2]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 48,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mask = [True, False, True]\n",
    "c[mask]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.2 改"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.1 改类别值\n",
    "这个功能用得会比较多，将字符串类别映射为数值类别。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 直接修改"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "c1 = c.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 6, 5]\n",
       "Categories (2, object): [5 < 6]"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.categories = ['5','6']   # 这种改法，新的类别序列与旧类别序列长度必须相同，实质为将值和类型依次替换\n",
    "c1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2\n",
       "1    1\n",
       "2    1\n",
       "3    3\n",
       "dtype: category\n",
       "Categories (3, int64): [1, 2, 3]"
      ]
     },
     "execution_count": 50,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1= s.copy()\n",
    "s1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    5\n",
       "1    6\n",
       "2    6\n",
       "3    7\n",
       "dtype: category\n",
       "Categories (3, int64): [6, 5, 7]"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1.cat.categories = [6,5,7]\n",
    "s1 # 对Series来说，用.cat操作改法是相同的"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 函数改\n",
    "#### `categories.rename_categories(cat , inplace = False)`\n",
    "- cat：新的类别，必须和旧类别长度相同；\n",
    "- inplace：True or False，是否原地修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 6, 5]\n",
       "Categories (2, object): [5 < 6]"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1 = c.copy()\n",
    "c1.rename_categories(['5','6'], inplace = True)  #和上面完全相同\n",
    "c1 "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    5\n",
       "1    6\n",
       "2    6\n",
       "3    7\n",
       "dtype: category\n",
       "Categories (3, object): [6, 5, 7]"
      ]
     },
     "execution_count": 54,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s1 = s.copy()\n",
    "s1.cat.rename_categories(['6','5','7'], inplace = True) # 和上面完全相同\n",
    "s1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.2 有序、无序转变\n",
    "#### `categories.as_ordered(inplace = False)`\n",
    "#### `categories.as_unordered(inplace = False)`\n",
    "- inplace：True or False，是否原地修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 122,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 122,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1 = c.copy()\n",
    "c1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 124,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3, 2]"
      ]
     },
     "execution_count": 124,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.as_unordered(inplace = True)\n",
    "c1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 126,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 126,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.as_ordered()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2.3 有序改变顺序\n",
    "#### `categories.reorder_categories(cat , ordered = False，inplace = False)`\n",
    "- cat：只能是旧类别改变顺序后的序列，不能增减类别；\n",
    "- ordered：True or False，类别是否有序\n",
    "- inplace：True or False，是否原地修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1 = c.copy()\n",
    "c1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [2 < 3]"
      ]
     },
     "execution_count": 63,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.reorder_categories([2,3],ordered = True,inplace = True)\n",
    "c1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.3 增"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### `categories.add_categories(cat，inplace = False)`\n",
    "- cat：想要新增的类别，必须不在旧类别中；\n",
    "- inplace：True or False，是否原地修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1 = c.copy()\n",
    "c1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (4, int64): [3 < 2 < 4 < 5]"
      ]
     },
     "execution_count": 74,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.add_categories([4,5], inplace = True)\n",
    "c1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.4 删"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.4.1 删除任意不需要的类别\n",
    "#### `categories.remove_categories(cat，inplace = False)`\n",
    "- cat：想要删除的类别，必须在旧类别中；\n",
    "- inplace：True or False，是否原地修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (3, int64): [3 < 2 < 5]"
      ]
     },
     "execution_count": 75,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.remove_categories([4],inplace = True)\n",
    "c1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2..4.2 去除没有使用的类别\n",
    "####  `categories.remove_unused_categories(inplace = False)`\n",
    "- inplace：True or False，是否原地修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.remove_unused_categories(inplace = True)  # 类别 5 被去除\n",
    "c1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2.5 改增删 三合一\n",
    "#### `categories.set_categories(cat , ordered = False，rename = False, inplace = False)`\n",
    "- cat：只能是旧类别改变顺序后的序列，不能增减类别；\n",
    "- ordered：True or False，改序，如果提供这一项，保持原来属性，最好明确给出；\n",
    "- rename：True or False，改名，这个参数我发现没啥用（？）；\n",
    "- inplace：True or False，是否原地修改。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, 3.0]\n",
       "Categories (2, int64): [3 < 2]"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1 = c.copy()\n",
    "c1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NaN, 2.0, NaN]\n",
       "Categories (3, int64): [2 < 4 < 5]"
      ]
     },
     "execution_count": 86,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c1.set_categories([2,4,5], ordered = True, inplace = True) # 删除了旧类别 1，增加新类别4、5,\n",
    "c1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---\n",
    "# 3. `cut() 和 qcut()`\n",
    "这俩函数用于将连续型变量分割为类别变量。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.1 `cut()`\n",
    "#### `pd.cut(x, bins, right = False,include_lowest=False, labels=None, retbins=False)`\n",
    "- x：待分割的Series或序列；\n",
    "- bins：如果是int，那么将Series的进行等分，并在最大最小值的基础上外延1%作为区间边界；如果是序列，那么将序列值作为分隔点；\n",
    "- right：True or False，分隔区间默认为左闭右开；\n",
    "- include_lowest：True or False，将最左侧区间的左值外延1%，试图去包含最小值；\n",
    "- labels：分隔后是区间，可以用label来替换为想要的类别形式；\n",
    "- retbins：是否返回分隔点；"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 117,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0\n",
       "1    1\n",
       "2    2\n",
       "3    3\n",
       "4    4\n",
       "dtype: int32"
      ]
     },
     "execution_count": 117,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "s = pd.Series( range(0,5))\n",
    "s"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    (-0.004, 1.333]\n",
       "1    (-0.004, 1.333]\n",
       "2     (1.333, 2.667]\n",
       "3       (2.667, 4.0]\n",
       "4       (2.667, 4.0]\n",
       "dtype: category\n",
       "Categories (3, interval[float64]): [(-0.004, 1.333] < (1.333, 2.667] < (2.667, 4.0]]"
      ]
     },
     "execution_count": 88,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut( s, 3)   # 可以看到一共3个类别，类别形式为区间形式(]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    a\n",
       "1    a\n",
       "2    b\n",
       "3    c\n",
       "4    c\n",
       "dtype: category\n",
       "Categories (3, object): [a < b < c]"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut( s, 3, labels = ['a','b','c'])   # 这样就清晰多了"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 90,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(0    a\n",
       " 1    a\n",
       " 2    b\n",
       " 3    c\n",
       " 4    c\n",
       " dtype: category\n",
       " Categories (3, object): [a < b < c],\n",
       " array([-0.004     ,  1.33333333,  2.66666667,  4.        ]))"
      ]
     },
     "execution_count": 90,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut( s, 3, labels = ['a','b','c'], retbins = True)   # 分隔点也返回"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    [0.0, 2.5)\n",
       "1    [0.0, 2.5)\n",
       "2    [0.0, 2.5)\n",
       "3    [2.5, 4.0)\n",
       "4           NaN\n",
       "dtype: category\n",
       "Categories (2, interval[float64]): [[0.0, 2.5) < [2.5, 4.0)]"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(s,[0,2.5,4], right = False)  # 左闭右开，不包括4，所以4不属于任何一类别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0           NaN\n",
       "1    (0.0, 2.5]\n",
       "2    (0.0, 2.5]\n",
       "3    (2.5, 4.0]\n",
       "4    (2.5, 4.0]\n",
       "dtype: category\n",
       "Categories (2, interval[float64]): [(0.0, 2.5] < (2.5, 4.0]]"
      ]
     },
     "execution_count": 100,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(s,[0,2.5,4], right = True)  # 左开右闭，不包括0，所以0不属于任何一类别"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 102,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    (-0.001, 2.5]\n",
       "1    (-0.001, 2.5]\n",
       "2    (-0.001, 2.5]\n",
       "3       (2.5, 4.0]\n",
       "4       (2.5, 4.0]\n",
       "dtype: category\n",
       "Categories (2, interval[float64]): [(-0.001, 2.5] < (2.5, 4.0]]"
      ]
     },
     "execution_count": 102,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.cut(s,[0,2.5,4], right = True, include_lowest = True)  # 最左侧值被包含"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3.2 `qcut()`\n",
    "#### `pd.qcut(x, q,  labels=None, retbins=False)`\n",
    "- x：待分割的Series或序列；\n",
    "- q：安装分位数也来定义分隔点，而不是按照给定值；\n",
    "- labels：分隔后是区间，可以用label来替换为想要的类别形式；\n",
    "- retbins：是否返回分隔点；"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 118,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    a\n",
       "1    a\n",
       "2    b\n",
       "3    c\n",
       "4    d\n",
       "dtype: category\n",
       "Categories (4, object): [a < b < c < d]"
      ]
     },
     "execution_count": 118,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.qcut(s, q = [0.0, 0.25, 0.5,0.75, 1.0], labels =['a','b','c','d']) \n",
    "# 5个分位点，形成 4 个区间。看来默认参数是right =True， include_lowest = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {
    "height": "318px",
    "width": "252px"
   },
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": "block",
   "toc_window_display": true
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
